-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix create nodepool azure command #1118
Fix create nodepool azure command #1118
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alvaroaleman The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✔️ Deploy Preview for hypershift-docs ready! 🔨 Explore the source changes: 37f6899 🔍 Inspect the deploy log: https://app.netlify.com/sites/hypershift-docs/deploys/6228ea59c1057d0008d0cab3 😎 Browse the preview: https://deploy-preview-1118--hypershift-docs.netlify.app/reference/api |
5224235
to
8c6d4f6
Compare
The default boot image will vary depending on the OCP release chosen for each NodePool. Otherwise it might even break the OS upgrade compatibility eventually for a new NodePool that starts with a very old boot image. We should keep it in NodePools and find it by default via release info if not set. Is there anything preventing us from doing the same than AWS as in the links below? hypershift/hypershift-operator/controllers/nodepool/nodepool_controller.go Lines 907 to 913 in ca08c87
|
No it does not. It gets created once and never updated. The rhcos version the node gets then updated to is what changes. This matches what the installer does: https://github.com/openshift/installer/blob/8fca1ade5b096d9b2cd312c4599881d099439288/data/data/azure/vnet/main.tf#L94
Yes, that there is no public image. Only a public blob which can be used to back an image (I believe even for that it needs to be copied first, but haven't verified that). |
Ok, so the installer infers the boot image from Then copies to https://github.com/openshift/installer/blob/master/pkg/asset/machines/azure/machines.go#L118
That's the same problem either you expose in NodePool or in HostedCluster (say we copy the image once and we always point to the same source from all NodePools), right? I still think Ideally a brand new NodePool shouldn't come up with the original HostedCluster boot image but rather discover the default boot image belonging to the .releaseImage of that NodePool, though I agree we wouldn't want to manage copying each image. @patrickdillon @cgwalters Can you refresh my mind Is there still any plan on installer/mco to manage lifecycle of boot images somehow openshift/enhancements#201? |
A cluster might have zero pools at which point we are stuck, because we can not create a new nodepool without an existing nodepool, this is why I moved it to the HostedCluster. If we end up needing arch-specific images, we will need to have the |
I think we still want to execute on that, and it may actually turn out to be a near-hard dependency of openshift/enhancements#1032 But we landed openshift/installer#4760 which was definitely intended to be used by hypershift - is that not sufficient? |
I'm not sure how you mean "we are stuck". The NodePool API enables you to create a NodePool by specifying its boot image as input anytime. I think we should preserve the ability for consumers to specify a different image per NodePool and keep the decoupling (Multi-arch, fresh boot image, testing alpha images, consumer custom images...). NodePools are by definition heterogenous to each other. Now, how to automate the process to discover and copy the image to a known location to be consumed as API input and how we provide the best UX on the CLI is a separate discussion which should not impact nor drive our API design: We can also keep it pretty much as it is and let HostedCluster be the fallback if bootimage is not specified in NodePool (now or add this in future) which make sense from an API pov and incidentally solves the cli automation problem. Alternatively I'm curious how expensive would be for the NodePool to own the boot image discovery from the payload and copying it? |
This was never fully implemented. In order to fix it, the bootImageId was moved from the nodepool to the cluster, because it is unique per cluster and never changes. Otherwise if there is no nodepool, we are unable to find it.
8c6d4f6
to
37f6899
Compare
@enxebre updated to calculate the bootImage if unset, that is a good idea, hadn't thought about that. |
@@ -423,11 +423,13 @@ spec: | |||
minimum: 16 | |||
type: integer | |||
imageID: | |||
description: 'ImageID is the id of the image to boot from. | |||
If unset, the default image at the location below will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a hardcoded default seems like a long term bad idea. I think we should error out if we can't find the right image instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no public azure images, only public azure image blobs, which need to be copied and referenced from an image in order to be usable. What we construct here is that reference. If we wouldn't do this, we have to store the location somewhere even if we have no nodepools, which would only leave the cluster and that doesn't really seem appropriate and might cause issues if we need more images in the future, for example because of a different arch or because we have a windows node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a hardcoded default seems like a long term bad idea. I think we should error out if we can't find the right image instead.
We default the backend for a zero friction happy path while still enabling other required scenarios by exposing input at the API.
In any case if the targeted image can't be found the capi Machine will fail and the error will be bubbled up as a NodePool.status.condition (eventually, PR in flight for capi upstream for bubbling up individual Machine errors to MahcineDeployments). Alternatively we could explicitly check the image existence from our controller and fail early but keeping our controller as slim and declarative as possible while delegating logic/implementation into capi is intended so letting the Machine fail and signalling the error back seems reasonable to me.
Thanks! let's get it to the merge pool, we can always follow up if Colin or anyone else have more feedback |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
@alvaroaleman: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This was never fully implemented. In order to fix it, the bootImageId
was moved from the nodepool to the cluster, because it is unique per
cluster and never changes. Otherwise if there is no nodepool, we are
unable to find it.
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, use
fixes #<issue_number>(, fixes #<issue_number>, ...)
format, where issue_number might be a GitHub issue, or a Jira story:Fixes #
Checklist